fix: Replace hardcoded token limits with model-specific configuration #1376

sgeraldes · 2026-01-20T17:06:25Z

Problem

Spec creation was failing with max_tokens: 65537 > 64000 error for Claude Opus 4.5. The codebase had magic numbers scattered throughout (62000, 63999, 64000) with no validation against model-specific limits.

Root Cause

Magic numbers: Token limits hardcoded in multiple files
No model validation: Thinking budgets not validated against model-specific limits
SDK bug workaround needed: Issue #8756 - SDK sometimes reduces max_tokens without adjusting thinking budget

Solution

Created a comprehensive model-specific configuration system:

Changes

apps/backend/model_limits.json: New configuration file with all Claude 4.5 model limits
- All models: 64K max_output_tokens (Opus, Sonnet, Haiku 4.5)
- Safe thinking budget: 60K tokens (leaves 4K buffer for SDK overhead)
- Documents validation rules and SDK bug workaround
apps/backend/phase_config.py: Load limits from config, add validation
- get_model_max_output_tokens(): Get model's max_tokens limit
- get_model_max_thinking_tokens(): Get safe thinking budget
- validate_thinking_budget(): Caps budgets to model limits with warnings
- All thinking budget calls now validate against model limits
apps/frontend/src/shared/constants/models.ts: Add model limit constants
- MODEL_OUTPUT_LIMITS: 64K for all models
- MODEL_MAX_THINKING: 60K safe limit
- Updated THINKING_BUDGET_MAP ultrathink to 60K
tests/test_model_limits.py: New comprehensive tests (10 tests)
- Validates all models have correct limits
- Tests budget capping for excessive values
- Tests API constraint (thinking < max_tokens)
- Tests 4K+ buffer for SDK overhead
tests/test_thinking_level_validation.py: Updated ultrathink budget to 60K

Technical Details

API constraint: max_tokens > thinking.budget_tokens (strictly greater)
All Claude 4.5 models: 64K max output, 200K context window
Safe thinking budget: 60K tokens (4K buffer for SDK overhead)
Graceful degradation: Warns and caps excessive budgets instead of failing

Why 60,000 instead of 63,999?

Model limit: 64,000 max_tokens
SDK overhead: ~4,000 tokens buffer needed
Safe budget: 60,000 tokens
Prevents max_tokens: 65537 > 64000 error

Testing

✅ All 19 tests pass (9 existing + 10 new)

Validates budget capping
Validates API constraints
Validates buffer requirements
Backward compatibility maintained

References

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Model-specific token limits now enforced across the system (64,000 output, 60,000 thinking budget)
- Added validation to cap thinking budgets within model constraints and log warnings when limits are exceeded
Tests
- Added comprehensive test coverage for model limits and thinking budget validation

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Problem: - Spec creation failing with "max_tokens: 65537 > 64000" error for Opus 4.5 - Magic numbers scattered across codebase (62000, 63999, 64000) - No validation against model-specific limits - SDK bug #8756 causes intermittent validation errors when max_tokens is reduced without adjusting thinking budget Solution: - Created model_limits.json configuration file with all Claude 4.5 model limits - All models have 64K max_output_tokens (Opus, Sonnet, Haiku 4.5) - Set ultrathink budget to 60K (leaves 4K buffer for SDK overhead) - Added validation functions to cap thinking budgets to model limits - Updated frontend constants to match backend configuration - Added comprehensive tests for model-specific validation Changes: - apps/backend/model_limits.json: New configuration file with model limits - apps/backend/phase_config.py: Load limits from config, add validation - apps/frontend/src/shared/constants/models.ts: Add model limit constants - tests/test_model_limits.py: New tests for model-specific validation (10 tests) - tests/test_thinking_level_validation.py: Update ultrathink budget to 60K Technical Details: - API constraint: max_tokens > thinking.budget_tokens (strictly greater) - All Claude 4.5 models: 64K max output, 200K context window - Safe thinking budget: 60K tokens (4K buffer for SDK overhead) - Graceful degradation: Warns and caps excessive budgets instead of failing Testing: - All 19 tests pass (9 existing + 10 new) - Validates budget capping, API constraints, and buffer requirements Fixes SDK bug workaround for issue anthropics/claude-code#8756 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Sebastian Geraldes <199673787+sebastiangeraldes@users.noreply.github.com>

CLAassistant · 2026-01-20T17:06:38Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

gemini-code-assist · 2026-01-20T17:06:52Z

Summary of Changes

Hello @sgeraldes, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses critical issues related to hardcoded token limits for Claude models, which were causing max_tokens errors. By introducing a centralized, model-specific configuration system, the codebase now dynamically manages token limits and thinking budgets, ensuring robust validation and preventing runtime failures. The changes provide a more flexible and resilient approach to handling varying model constraints, improving the overall stability and maintainability of the system.

Highlights

Centralized Model Configuration: Introduced a new configuration file, apps/backend/model_limits.json, to centralize and define model-specific token limits, including max output tokens, context window, and safe thinking budgets for Claude 4.5 models.
Dynamic Token Limit Validation: Implemented functions in apps/backend/phase_config.py to dynamically load model limits and validate thinking budgets against these limits, preventing API errors caused by exceeding model-specific token constraints. This includes a graceful degradation mechanism that warns and caps excessive budgets.
Frontend and Backend Synchronization: Updated frontend constants in apps/frontend/src/shared/constants/models.ts to reflect the new model-specific output and thinking limits, ensuring consistency across the application.
Comprehensive Test Coverage: Added a new test suite in tests/test_model_limits.py with 10 new tests to rigorously validate the model limit configuration, budget capping logic, API constraints, and the necessary buffer for SDK overhead. Existing tests were also updated to reflect the new ultrathink budget.
SDK Bug Workaround: Incorporated a 4K token safety buffer into the maximum thinking budget (60K instead of 64K) to mitigate issues arising from an SDK bug where max_tokens might be reduced without corresponding adjustments to the thinking budget.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-01-20T17:06:55Z

📝 Walkthrough

Walkthrough

A new centralized model limits configuration (model_limits.json) is added to define token constraints and thinking budgets for Claude model variants. The backend (phase_config.py) is updated to load and enforce these limits dynamically, with new validation functions. Frontend constants are synchronized with backend values, and comprehensive tests validate the model-aware budgeting behavior.

Changes

Cohort / File(s)	Summary
Configuration & Model Limits `apps/backend/model_limits.json`	New JSON configuration defining model variants (Claude 4.5 families), max output tokens (64,000), max thinking tokens (60,000), thinking levels with budgets (none/low/medium/high/ultrathink at 60,000), and validation rules for token constraints.
Backend Model-Aware Budgeting `apps/backend/phase_config.py`	Integrated model limits loading with `_load_model_limits()`; added functions `get_model_max_output_tokens()`, `get_model_max_thinking_tokens()`, and `validate_thinking_budget()`; refactored `get_thinking_budget()`, `get_phase_thinking_budget()`, `get_phase_config()`, and `get_spec_phase_thinking_budget()` to accept optional `model_id` parameter and enforce model-specific constraints; replaced static `THINKING_BUDGET_MAP` with dynamic derivation from `_MODEL_LIMITS`.
Frontend Model Constraints `apps/frontend/src/shared/constants/models.ts`	Added `MODEL_OUTPUT_LIMITS` (all models: 64,000) and `MODEL_MAX_THINKING` (all models: 60,000) exports; updated `THINKING_BUDGET_MAP` ultrathink value from 63,999 to 60,000 with clarifying comment.
Model Limits Validation Tests `tests/test_model_limits.py`	New comprehensive test suite validating model output/thinking token limits, budget validation behavior, model-aware capping of excessive budgets, backward compatibility, and SDK buffer requirements (4,000-token minimum gap between max output and max thinking).
Thinking Level Test Updates `tests/test_thinking_level_validation.py`	Updated ultrathink budget expectation from 63,999 to 60,000 in two test assertions; revised documentation comment explaining new 4K SDK buffer rationale.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Issue #1323 — Addresses enforcement of ultrathink thinking budget limits in phase_config.py validation logic.
Issue #1212 — Addresses ultrathink budget value alignment between phase_config.py and frontend constants.
Issue #1218 — Modifies thinking-budget logic affecting both backend phase_config and frontend model constants.

Possibly related PRs

PR #1284 — Also updates ultrathink thinking-budget configuration in the same backend/frontend constants area.
PR #1173 — Directly modifies ultrathink maximum to 60,000 across phase_config and test files.

Suggested labels

area/fullstack

Poem

🐰 A rabbit hops through limits neat,
Sixty thousand tokens, ultrathink's treat!
From JSON config to budgets tight,
Models constrained with thinking just right,
Four-K buffer keeps SDK flight! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: replacing hardcoded token limits with model-specific configuration, which is the core objective and primary change across multiple files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions

🎉 Thanks for your first PR!

A maintainer will review it soon. Please make sure:

Your branch is synced with develop
CI checks pass
You've followed our contribution guide

Welcome to the Auto Claude community!

sentry · 2026-01-20T17:07:52Z

Codecov Report

❌ Patch coverage is 85.36585% with 6 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
apps/backend/phase_config.py	85.36%	6 Missing ⚠️

📢 Thoughts on this report? Let us know!

gemini-code-assist

Code Review

This pull request is an excellent improvement that addresses the problem of hardcoded token limits by introducing a model-specific configuration system. The new model_limits.json file centralizes all model constraints, and the backend code in phase_config.py is cleanly refactored to load, validate, and gracefully handle these limits. I particularly appreciate the addition of comprehensive tests in tests/test_model_limits.py, which ensure the new logic is robust and correct. My only suggestion for improvement relates to the duplication of model limits on the frontend, which could be addressed to further improve maintainability.

gemini-code-assist · 2026-01-20T17:09:57Z

apps/frontend/src/shared/constants/models.ts

+// Model-specific output token limits (all Claude 4.5 models have 64K max_tokens)
+export const MODEL_OUTPUT_LIMITS: Record<string, number> = {
+  'claude-opus-4-5-20251101': 64000,
+  'claude-sonnet-4-5-20250929': 64000,
+  'claude-haiku-4-5-20251001': 64000,
+  opus: 64000,
+  sonnet: 64000,
+  haiku: 64000
+} as const;
+
+// Maximum safe thinking budget for each model (leaves buffer for SDK overhead)
+export const MODEL_MAX_THINKING: Record<string, number> = {
+  'claude-opus-4-5-20251101': 60000,
+  'claude-sonnet-4-5-20250929': 60000,
+  'claude-haiku-4-5-20251001': 60000,
+  opus: 60000,
+  sonnet: 60000,
+  haiku: 60000
 } as const;


While centralizing the model limits on the backend is a great step, these new constants (MODEL_OUTPUT_LIMITS and MODEL_MAX_THINKING) duplicate the configuration from apps/backend/model_limits.json. This creates two sources of truth and could lead to inconsistencies if limits are updated in one place but not the other.

To improve maintainability and ensure consistency, consider creating a backend API endpoint that exposes these model limits. The frontend could then fetch this configuration when the application loads. This would make model_limits.json the single source of truth for the entire application.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

apps/backend/phase_config.py (1)

1-17: Ruff format check is failing for apps/backend.

CI reports one file needs formatting. Please run ruff format apps/backend/ --quiet and commit the changes.
tests/test_thinking_level_validation.py (1)
13-13: Fix import path casing to use lowercase "apps".

Line 13 uses "Apps" but the actual directory is lowercase "apps". On case-sensitive filesystems this will fail. Please align with the repository structure and other tests like test_model_limits.py.
Proposed fix
- sys.path.insert(0, str(Path(__file__).parent.parent / "Apps" / "backend"))
+ sys.path.insert(0, str(Path(__file__).parent.parent / "apps" / "backend"))

🤖 Fix all issues with AI agents

In `@apps/backend/phase_config.py`:
- Around line 25-31: The _load_model_limits function currently builds the file
path via Path(__file__).parent / "model_limits.json" (limits_file); update it to
use the project’s platform abstraction path API instead: import the platform
abstraction module and replace the limits_file construction with the platform’s
path/resource helper (e.g., platform.join_path or platform.get_resource_path) so
path handling follows backend guidelines while keeping the same open(...,
encoding="utf-8") call and return json.load(f).

coderabbitai · 2026-01-20T17:13:40Z

apps/backend/phase_config.py

+# Load model limits from configuration file
+def _load_model_limits() -> dict:
+    """Load model limits from model_limits.json."""
+    limits_file = Path(__file__).parent / "model_limits.json"
+    try:
+        with open(limits_file, encoding="utf-8") as f:
+            return json.load(f)


🛠️ Refactor suggestion | 🟠 Major

Use the platform abstraction module for path handling.

Line 28 constructs paths directly via Path. Per backend guidelines, route path handling through the project’s platform abstraction module.

🤖 Prompt for AI Agents

In `@apps/backend/phase_config.py` around lines 25 - 31, The _load_model_limits function currently builds the file path via Path(__file__).parent / "model_limits.json" (limits_file); update it to use the project’s platform abstraction path API instead: import the platform abstraction module and replace the limits_file construction with the platform’s path/resource helper (e.g., platform.join_path or platform.get_resource_path) so path handling follows backend guidelines while keeping the same open(..., encoding="utf-8") call and return json.load(f).

AndyMik90

🤖 Auto Claude PR Review

Merge Verdict: 🔴 BLOCKED

🔴 Blocked - 2 CI check(s) failing. Fix CI before merge.

Blocked: 2 CI check(s) failing. Fix CI before merge.

Risk Assessment

Factor	Level	Notes
Complexity	Medium	Based on lines changed
Security Impact	None	Based on security findings
Scope Coherence	Good	Based on structural review

🚨 Blocking Issues (Must Fix)

CI Failed: Lint Complete
CI Failed: Python (Ruff)

Findings Summary

Low: 3 issue(s)

Generated by Auto Claude PR Review

Findings (3 selected of 3 total)

🔵 [078e01afefcd] [LOW] [Potential] Frontend/backend configuration duplication requires manual sync

📁 apps/frontend/src/shared/constants/models.ts:26

Token limits (64000, 60000) and thinking level budgets are defined in both apps/backend/model_limits.json (source of truth) and apps/frontend/src/shared/constants/models.ts (lines 27-53). While a comment on line 26 documents the sync requirement, there's no automated validation. This is an acceptable trade-off given the complexity of sharing config between Python/TypeScript, but creates future maintenance burden.

Suggested fix:

Consider adding a CI test that compares frontend constants against backend JSON to catch drift. Alternatively, document the sync requirement more prominently in both files.

🔵 [d6f9e65c82a1] [LOW] [Potential] Cross-reference comment uses outdated path

📁 apps/backend/phase_config.py:54

Comment references 'auto-claude-ui/src/shared/constants/models.ts' but the actual frontend path is 'apps/frontend/src/shared/constants/models.ts'. This appears to be a legacy project name.

Suggested fix:

Update comment from 'auto-claude-ui/' to 'apps/frontend/'

🔵 [ea8f8d448fce] [LOW] [Potential] Missing tests for JSON file loading failure scenarios

📁 tests/test_model_limits.py:1

The _load_model_limits() function in phase_config.py handles FileNotFoundError and JSONDecodeError with fallback defaults (lines 32-48), but the test file has no coverage for these error paths. If fallback values are accidentally modified, no test would catch the regression.

Suggested fix:

Add tests that mock file loading failures: (1) Use unittest.mock.patch('builtins.open') to simulate FileNotFoundError, (2) Verify fallback dict is returned, (3) Verify warning is logged.

This review was generated by Auto Claude.

sgeraldes · 2026-01-20T17:54:45Z

Comment is not valid. Too much code for a quick patch. Already added tests for a set of hardcoded magic numbers that got added as configurations instead. Simplify and implement quickly.

sgeraldes · 2026-01-20T17:56:39Z

I have read the CLA Document and I hereby sign the CLA

github-actions bot reviewed Jan 20, 2026

View reviewed changes

gemini-code-assist bot reviewed Jan 20, 2026

View reviewed changes

coderabbitai bot reviewed Jan 20, 2026

View reviewed changes

AndyMik90 self-assigned this Jan 20, 2026

AndyMik90 reviewed Jan 20, 2026

View reviewed changes

AndyMik90 force-pushed the develop branch 2 times, most recently from 67a743f to e83e445 Compare January 21, 2026 14:26

Uh oh!

fix: Replace hardcoded token limits with model-specific configuration #1376

Are you sure you want to change the base?

fix: Replace hardcoded token limits with model-specific configuration #1376

Uh oh!

Conversation

sgeraldes commented Jan 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Solution

Changes

Technical Details

Why 60,000 instead of 63,999?

Testing

References

Summary by CodeRabbit

Uh oh!

CLAassistant commented Jan 20, 2026

Uh oh!

gemini-code-assist bot commented Jan 20, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Poem

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

sentry bot commented Jan 20, 2026

Codecov Report

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

AndyMik90 left a comment

Choose a reason for hiding this comment

🤖 Auto Claude PR Review

Merge Verdict: 🔴 BLOCKED

Risk Assessment

🚨 Blocking Issues (Must Fix)

Findings Summary

Findings (3 selected of 3 total)

🔵 [078e01afefcd] [LOW] [Potential] Frontend/backend configuration duplication requires manual sync

🔵 [d6f9e65c82a1] [LOW] [Potential] Cross-reference comment uses outdated path

🔵 [ea8f8d448fce] [LOW] [Potential] Missing tests for JSON file loading failure scenarios

Uh oh!

sgeraldes commented Jan 20, 2026

Uh oh!

sgeraldes commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sgeraldes commented Jan 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 20, 2026 •

edited

Loading